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ABSTRACT 


V”A  three-dimensional  version  of  the  Beam-Warming  scheme  for  solving  the 
compressible  Navier-Stokes  equations  was  implemented  on  the  Cray-1  computer. 
The  scheme  is  implicit  and  second-order  accurate.  The  code  is  totally  vec¬ 
torized  ,  allows  for  complicated  geometries  and  includes  a  thin  layer  turbu¬ 
lence  model.  Timings  and  comparisons  are  given.  A  preliminary  discussion  of 
the  full  viscous  model  is  included. 

\ 


I.  INTRODUCTION 


In  recent  years  substantial  progress  has  been  made  in  the  area  of  numeri¬ 
cal  simulation  of  fluid  flows.  However,  one  major  limitation  has  been  the  size 
and  speed  of  the  available  computers.  Now  the  new  generations  of  computers,  in¬ 
cluding  the  ILLIAC  IV,  the  Cray-1,  and  the  CYBER  205,  use  vector  architectures 
and  thus  offer  new  possibilities  in  the  area  of  fluid  flow  simulation. 

The  major  purpose  of  this  project  was  to  implement  the  Beam-Warming  scheme 
for  solving  the  Navier-Stokes  equations  [1]  on  the  Cray-1.  More  precisely,  the 
plan  was  to  combine  the  best  parts  of  the  codes  written  by  Steger  and  Pulliam 
[2]  for  the  NASA- Ames  CDC  7600  and  by  Lomax  and  Pulliam  [3]  for  the  NASA-Ames 
ILLIAC  IV,  so  as  to  take  maximum  advantage  of  the  Cray-1.  Our  task  was  compli¬ 
cated  by  the  fact  that  the  major  loops  in  the  two  codes  were  in  opposite  orders. 
However,  we  were  able  to  capture  all  of  the  important  vector  operations  of  the 
ILLIAC  code  and  accomplish  our  goal. 

A  secondary  goal  of  the  project  was  to  include  the  full  viscous  effect  in 
the  above  described  code.  The  implementation  of  this  goal  has  not  been  completed. 


2.  EQUATIONS  IN  NONDIMENS IONAL  FORM 

We  are  interested  in  the  numerical  simulation  of  unsteady,  three-dimensional 
flows  of  a  compressible,  viscous  fluid  in  an  arbitrary  geometry.  We  wish  to  use 
a  grid  generating  scheme  so  we  assume  that  the  geometry  of  the  physical  problem 
given  in  x-y-z  space  has  been  mapped  onto  a  rectangular  parallelpiped  in  the  £- 
n-5  space  and  that  the  metrics  £x,  £y,  £z ,  nx,  ny,  nz,  Cx,  cy,  T,z  and  the  Jaco¬ 
bian  J  of  the  mapping  are  provided.  (For  work  on  grid  generating  schemes  see 
[A],  [5]  or  [6].)  Hence  we  must  solve  the  following  system  of  equations. 
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2ywz;  p,  u»  v,  w,  e,  p  are  the  density,  velocity  components.  Internal  energy 
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and  pressure  (p  ■  (y— 1)  [e  -  .5p(u  +  v  +  w  )]),  respectively;  y,  X,  k,  Pr, 

Re  are  the  dynamic  viscosity,  kinematic  viscosity,  coefficient  of  thermal 
conductivity,  Prandtl  number,  and  Reynolds  number;  \ 
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E  ,  P  ,  and  G  are  E,  F,  and  G  evaluated  at  the  free  stream  values  of  a. 
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3.  THIN  LAYER  APPROXIMATION 


For  high  Reynolds  number  flows  with  the  assumptions  that  the  n-varlables  is 
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normal  to  the  mapped  rigid  boundary  and  that  ve  are  only  interested  in  a  thin 
layer  of  flow  along  the  rigid  surface  (or  equivalently  that  the  viscous  for¬ 
ces  away  from  the  surface  are  negligible),  ve  can  eliminate  the  £  and  £  deriv¬ 
ative  terms  in  the  viscous  terms  (E  ,  F  and  G  ) .  We  then  arrive  at  the  fol- 
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Equation  (2)  is  solved  by  the  Beam-Warming  scheme  [1]  or  more  specifically 
the  Steger-Pulliam  implementation  of  the  Beam-Warming  scheme  [2],  The  technique 
uses  trapezoidal  time  differencing,  expansion  of  the  nonlinear  terms  about  the 
nt^1  time  step,  and  approximately  factoring  the  resulting  equation.  We  then  ob¬ 
tain  the  following  finite  difference  equation 
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where  At  is  the  time  increment;  <5_,  6  ,  fi,  are  central  differences  with  respect 

to  the  appropriate  space  variable,  and  are  amounts  of  dissipation  added 

(second  order  and  fourth  order,  respectively);  all  functions  with  a  superscript 

n+1  n  3- 

m  denote  that  function  evaluated  at  time  mAt;  Ag  ■  g  -  g  ;  and  A  ■  B  ■ 


3F  3G 

9J’  C  ■  ij  *nd  M 


3S 

-r— .  Also  the  turbulence  model  described  in  [7]  can  be  inclu- 
3g 


-3- 


ded  in  Equation  (3)  by  varying  the  coeficients  associated  with  S.  For  an  ex¬ 
cellent  paper  describing  the  scheme  see  [7 } • 

The  scheme  coded  is  to  solve  Equation  (3).  This  process  consists  of  the 
following  four  steps:  (1)  the  explicit  computation  of  the  right-hand  side;  (2) 

-  (4)  the  solution  of  three  block  tridiagonal  systems  of  equations. 

4.  CRAY-1  CODE  FOR  THIN  LAYER  APPROXIMATION 

The  7600  code  to  solve  Equation  (3)  is  relatively  straightforward  (or  at 
least  as  straightforward  as  such  a  large  code  can  be) .  The  right-hand  side  is 
evaluated  and  then  the  three  block  tridiagonals  are  solved.  An  important  factor 
is  that  the  equation  to  be  solved  is  indexed  in  the  direction  associated  with  that 
particular  factor.  For  example,  when  solving  the  equation 

(4)  (I+^Bn-£.V4J  -  -^-M^u  *  Right-hand  side,  • 
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it  is  logical  and  saves  storage  space  to  fix  the  J  and  L  indices  (indices  associ¬ 
ated  with  the  £  and  £  directions,  respectively)  and  solve  the  resulting  system  of 
equations  indexed  by  K  (index  in  the  n  direction).  This  allows  the  7600  code  to 
solve  a  30-row  (because  of  size  limitations  30  is  the  largest  number  of  mesh 
points  used  in  any  direction)  block  (5x5  blocks)  tridiagonal. 

Unfortunately,  although  the  above  procedure  is  probably  vectorizable,  it 
is  nowhere  near  optimum  for  the  Cray-1.  The  largest  resulting  vector  would  be 
30  where  multiples  of  64  are  the  optimum  vector  sizes  on  the  Cray-1  (and  bigger 
vectors  yet  gain  speed  on  the  CYBER  205) . 

Because  of  the  vector  capabilities  of  the  ILLIAC,  the  ILL1AC  code  is  quite 
different  from  the  7600  code.  The  ILLIAC  code  is  written  in  CFO,  [8],  which  is 
similar  to  FORTRAN  so  parts  were  able  to  be  used  almost  exactly  as  in  the  ILLIAC 
code.  However,  due  to  peculiarities  of  the  ILLIAC,  much  of  the  code  is  ILLIAC 
dependent  (disc  manipulations,  manipulation  of  scalars,  scalar  arithmetic,  etc.). 
Hence,  we  developed  the  Cray  code  by  using  the  serial  operation  flow  of  the  7600 


while  retaining  all  of  the  vector  portions  of  the  ILLIAC  code  that  are  performed 
often.  For  another  Cray-1  implementation  of  the  ILLIAC  code  and  the  subsequent 
comparison,  see  [9]. 

Much  of  the  logic  in  the  ILLIAC  code  is  due  to  the  data  structure  and  man¬ 
ipulation.  One  problem  is  that  the  ILLIAC  has  only  130K  words  of  memory  so  to 
be  able  to  work  a  very  large  problem  it  is  necessary  to  use  the  disc  extensively. 
Also,  two  characteristics  of  the  ILLIAC  are  that  it  will  only  handle  vectors  of 
length  64  or  less  (and  it  is  inefficient  to  use  less)  and  it  is  very  difficult 
(nearly  impossible)  to  transfer  any  information  down  the  vector.  Because  of  these 
limitations  the  data  structure  used  in  the  ILLIAC  code  is  to  partition  the  grid 
into  8x8x8  blocks.  A  row  of  these  blocks  in  a  given  direction  is  then  called 
a  pencil  in  that  direction.  Because  of  the  space  limitation  of  the  memory,  the 
scheme  is  to  bring  a  pencil  at  a  time  into  core.  The  sdlution  scheme  involves 
first  sweeping  through  each  of  the  £-  pencils  and  with  the  data  in  the  machine 
where  the  T)-£  directions  are  the  components  of  the  vector  (a  4-pencil  will  con¬ 
tain  JMAX  8x8  n-c  planes),  calculate  the  necessary  4  differences  in  the  right- 
hand  side  of  Equation  (3).  The  next  two  steps  involve  doing  the  same  thing  with 
the  n  md  £  pencils.  These  steps  are  necessary  with  the  ILLIAC  since  the  machine 
will  not  allow  arithmetic  along  a  vector.  This  implementation  is  convenient  since 
it  makes  the  right-hand  side  calculations  vector  operations.  The  next  three  steps 
involve  sweeping  through  the  pencils  in  each  of  the  directions  again,  at  each  step 
solving  the  appropriate  block  tridiagonal  matrix.  For  example.  Equation  (4)  would 
be  solved  while  sweeping  the  q-pencils.  Since  the  left-hand  side  of  Equation  (4) 
involves  only  q  differences,  the  vector  contains  an  8  x  8  piece  of  a  £-(  plane. 
This  makes  solving  Equation  (4)  a  perfect  vector  operation.  We  might  add  that 
for  each  of  these  sweeps  the  data  must  be  in  the  machine  in  a  different  order  (to 
make  the  operations  vector  and  to  make  the  matrices  block  tridiagonals).  Hence, 
some  of  the  sweeps  also  Include  the  appropriate  transposes  to  get  the  data  in  the 
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right  places.  We  should  also  add  that  by  doing  the  right-hand  side  calculation 
and  solving  the  appropriate  block  tridiagonal  matrix  during  the  same  sweep  for 
the  £  and  n  directions,  it  is  necessary  to  sweep  the  data  four  times  rather  than 
the  six  indicated  above. 

One  convenient  aspect  about  converting  the  1LLIAC  code  to  the  Cray  was  that 
the  above  described  data  structure  is  also  approximately  the  best  to  use  on  the 
Cray.  Initially  it  might  seem  that  since  the  Cray  vector  operations  have  no 
length  restrictions  (except  that  the  speed  is  optimum  when  the  vector  length  is 
a  multiple  of  64),  the  best  approach  is  to  use  pencils  that  include  the  entire 
grid  or  at  least  bigger  than  8x8  planes.  This  would  be  possible  for  problems 
if  a  disk  version  of  the  Cray  code  was  desired.  However,  the  above  scheme  for 
the  full  grid  would  require  approximately  40  x  (grid  size)  +  50  x  (largest  plane 
size)  words  of  storage  so  only  relatively  small  problems'  would  fit  on  the  Cray. 

Our  Cray  code  is  written  so  that  it  is  contained  entirely  in  core.  In  gen¬ 
eral,  the  block  tridiagonal  solver  will  require  3  x  (vector  length)  x  25  x  (maxi¬ 
mum  pencil  length  -  2)  words  of  storage.  Allowing  space  for  the  full  matrix  is 
advantageous  because  it  allows  use  of  canned  block  tridiagonal  solvers  such  as 
that  described  in  [10).  We  chose  not  to  use  this  approach.  Our  block  tridiagonal 
solver  generates  the  matrix  during  the  forward  sweep  so  if  8  x  8  pencils  are  used, 
storage  due  to  the  matrix  solver  is  minimal  (64  x  25  x  (pencil  length)  +  2)). 

Thus  we  have  used  the  same  data  structure  for  our  Cray  code  as  we  used  in 
the  ILLIAC  code.  The  storage  requirements  for  our  code  are  approximately  15  x 
(grid  size)  +  64  x  43  x  (maximum  pencil  length)  +  64  x  90  so  at  least  moderately 
large  problems  can  be  worked  using  the  code. 

Hence,  the  code  proceeds  to  solve  Equation  (3)  by  sweeping  the  data  to  form 
the  right-hand  side  and  again  sweep  the  data  to  solve  the  matrix  equations,  by 
the  same  four  sweeps  in  the  ILLIAC  code. 
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5.  THIN  LAYER  CODE  RESULTS 

It  is  very  difficult  and  somewhat  meaningless  to  compare  results  of  a  code 
such  as  this  one  on  different  machines.  As  mentioned  earlier  the  ILLIAC  code 
used  a  large  number  of  disk  manipulations.  The  7600  code  used  the  slower  large 
core  memory.  So  to  compare  times  of  the  Cray  code  (which  is  all  in  core)  might 
not  mean  much.  For  lack  of  other  fairer  comparisons  we  will  make  some  of  the 
above  comparisons. 

The  Cray-1  code  runs  at  about  4.9  seconds  per  time  step  for  a  32  x  32  x  24 
grid.  The  above  time  includes  the  starting  and  ending  overhead  which  we  assume  is 
negligible.  When  the  vector  mode  of  the  Cray  is  turned  off,  the  code  takes  16.7 
seconds  per  time  step  for  the  same  grid.  Thus  it  is  apparent  that  the  code  is 
gaining  efficiency  from  the  vectorization . 

It  should  be  noted  that  in  similar  tests  reported  in  [9]  a  Cray  code  for 
the  Beam-Warming  scheme  took  1.8  seconds  per  time  step  for  a  20  x  30  x  21  grid 
(and  1.3  seconds  when  using  the  matrix  solver  given  in  [10]).  Though  this 
seems  like  a  big  difference,  it  is  not.  It  must  be  pointed  out  that  a  32  x  32 
x  24  grid  is  almost  twice  as  big  as  a  20  x  30  x  21  grid.  Also  the  code  reported 
in  [9]  calculated  the  metrics  associated  with  the  grid  generation  once  and 
stored  them  while  our  code  recalculated  them  for  each  sweep.  Another  difference 
between  the  two  codes  is  that  their  code  used  blocks  of  size  11,  instead  of  8. 

A  consequence  of  their  time  saving  techniques  is  that  largest  grid  possible  for 
their  code  without  going  to  disk  is  30  x  30  x  38  whereas  our  code  will  allow  for 
a  32  x  32  x  48  grid  without  disk. 

An  analysis  of  the  time  used  in  the  code  showing  that  40. 5%  of  the  time  is 
spent  in  setting  up  and  solving  the  block  tridiagonal  matrices.  Thus  it  might 
be  worth  using  a  canned  block  tridiagonal  solver  if  the  storage  requirements 
make  it  possible.  It  might  also  be  worthwhile  to  develop  an  efficient  block 
tridiagonal  solver  that  does  not  require  storage  of  the  full  tridiagonal  matrix. 
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The  best  estimates  for  the  ILLIAC  and  7600  runs  of  the  analogous  material 
are  5.1  and  19.9  seconds  per  time  step  for  a  20  x  30  x  21  grid.  But  again  ve 
admit  that  these  comparisons  are  not  really  relevant. 

It  should  be  noted  that  the  thin  layer  code  was  only  run  with  the  free 
stream  problem  and  flow  about  a  hemisphere  cylinder.  For  a  discussion  of  the 
latter  problem  see  [2].  Since  our  grids  contained  of  different  number  of  grid 
points  (and  it  is  impossible  to  use  a  grid  such  as  theirs  in  our  code) ,  an 
exact  comparison  of  the  two  codes  was  not  possible.  We  did,  however,  plot  the 
values  of  p/p^  versus  x/R  as  do  Steger  and  Pulliam  in  [2]  (Figure  8)  to  at 
least  verify  that  the  pressures  on  the  body  are  the  same.  The  plots  were  so 
much  the  same  that  we  did  not  reproduce  them  here. 

6.  FULL  VISCOUS  MODEL 

As  mentioned  in  the  Introduction  the  inclusion  of  the  full  viscous  effects 
has  not  been  completed.  There  are  two  reasons  why  we  wish  to  include  the  full 
viscous  effects.  The  first  is  to  see  whether  they  are  actually  negligible.  The 
second  reason  is  to  make  it  easier  to  calculate  flows  in  certain  complex  geom¬ 
etries  (for  instance,  flows  in  corners).  To  accomplish  the  latter  goal  we  felt 
that  it  was  necessary  to  include  the  terms,  linearize,  factor  and  then  include 
a  three-dimensional  turbulence  model.  After  some  work  this  approach  was  aban¬ 
doned  as  being  too  difficult  for  the  present  timeframe. 

We  next  decided  that  it  would  be  possible  to  answer  the  first  question 
sufficiently  well  by  including  the  remaining  viscous  (the  terms  not  included  in 
the  thin  layer  approximation)  explicitly.  We  are  presently  in  the  process  of 
doing  this. 

If  we  return  to  Equations  (1)  and  (2)  it  is  not  hard  to  see  that  the  parts 
of  (1)  that  are  omitted  in  (2)  are 
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Thus  our  present  plan  Is  to  use  our  present  code  with  the  scheme  altered  so 
as  to  solve  Equation  (3)  with  a  term  V°  added  to  the  right  hand  side.  If  the  ef¬ 
fects  of  the  full  viscous  terms  are  not  negligible,  then  the  next  step  will  be  to 
include  the  terms  in  V  implicitly. 
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